Attribute reduction of data with error ranges and test costs

نویسندگان

  • Fan Min
  • William Zhu
چکیده

In data mining applications, we have a number of measurement methods to obtain a data item with different test costs and different error ranges. Test costs refer to time, money, or other resources spent in obtaining data items related to some object; observational errors correspond to differences in measured and true value of a data item. In supervised learning, we need to decide which data items to obtain and which measurement methods to employ, so as to minimize the total test cost and help in constructing classifiers. This paper studies this problem in four steps. First, data models are built to address error ranges and test costs. Second, error-range-based covering rough set is constructed to define lower and upper approximations, positive regions, and relative reducts. A closely related theory deals with neighborhood rough set, which has been successfully applied to heterogeneous attribute reduction. The major difference between the two theories is the definition of neighborhood. Third, the minimal test cost attribute reduction problem is redefined in the new theory. Fourth, both backtrack and heuristic algorithms are proposed to deal with the new problem. The algorithms are tested on ten UCI (University of California – Irvine) datasets. Experimental results show that the backtrack algorithm is efficient on rationalsized datasets, the weighting mechanism for the heuristic information is effective, and the competition approach can improve the quality of the result significantly. This study suggests new research trends concerning attribute reduction and covering rough set. 2012 Elsevier Inc. All rights reserved.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Explaining the Relationship Between Sticky of Expenses with Prediction Error of Profit in Tehran Stock Exchange

One of the basic assumptions of management accounting illustrate that costschanges has a significance Relationship with increasing and decreasing in the levelof activity, recently after being raised of sticky costs issue by Anderson and hiscolleagues this assumption was discussed. It means Increases in costs by increasingthe more activity level of reduction in costs is exchange for the reductio...

متن کامل

Modeling Critical Flow through Choke for a Gas-condensate Reservoir Based on Drill Stem Test Data

Gas-condensate reservoirs contain hydrocarbon fluids with characteristics between oil and gas reservoirs and a high gas-liquid ratio. Due to the large gas-liquid ratio, wellhead choke calculations using the empirical equations such as Gilbert may contain considerable error. In this study, using drill stem test (DST) data of a gas-condensate reservoir, coefficients of Gilbert equation was modifi...

متن کامل

Yarn tenacity modeling using artificial neural networks and development of a decision support system based on genetic algorithms

Yarn tenacity is one of the most important properties in yarn production. This paper addresses modeling of yarn tenacity as well as optimally determining the amounts of the effective inputs to produce yarn with desired tenacity. The artificial neural network is used as a suitable structure for tenacity modeling of cotton yarn with 30 Ne. As the first step for modeling, the empirical data is col...

متن کامل

The Predictability Power of Neural Network and Genetic Algorithm from Fiems’ Financial crisis

Organizations expose to financial risk that can lead to bankruptcy and loss of business is increased nowadays. This may leads to discontinuity in operations, increased legal fees, administrative costs and other indirect costs. Accordingly, the purpose of this study was to predict the financial crisis of Tehran Stock Exchange using neural network and genetic algorithm. This research is descripti...

متن کامل

Zoning hydraulic conductivity using different geostatistical methods (Case study Shavoor)

In studies of irrigation and drainage projects for drainage, it is necessary to extend the data from the sampling point to the network. Therefore, based on available data from observational wells, estimating the state of hydraulic conductivity (K) in the surrounding area. The estimation process values for locations where there is no information for them based on viewing areas called wells spati...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Inf. Sci.

دوره 211  شماره 

صفحات  -

تاریخ انتشار 2012